06. Importance of EDA

Importance of EDA

ND320 AIHCND C01 L01 A06 Importance Of EDA V2

Key Points

Where EDA fits into CRISP-DM

Where EDA fits into CRISP-DM

Data Schema Analysis

EDA: Exploratory Data Analysis

EDA is a step in the data science process that is often overlooked for the modeling and evaluation phase that can be easier to quantify and benchmark.

CRISP-DM: This stands for “cross-industry standard process for data mining” and is a common framework used for data science projects and includes a series of steps from business understanding to deployment.

EDA and CRISP-DM

As you can see from the image above EDA falls in the Data Understanding phase of CRISP-DM

Additional Resources

CRISP-DM
What is Exploratory Data Analysis
EDA in Python

Reasons EDA is important

  • EDA can enable you to discover features or data transformations/aggregations that might have data leakage. This can save a tremendous amount of time and prevent you from building a flawed model.
  • EDA can help you better translate and define modeling objectives and corresponding evaluation metrics from a machine learning/data science and business perspective.
  • EDA can help inform strategies for handling missing/null/zero valued data. This is a common issue that you will encounter with EHR data that you will have missing values and have to determine imputing strategies accordingly.
  • EDA can help to identify subsets of features to utilize for feature engineering and modeling along with appropriate feature transformations based off of type (e.g. categorical vs numerical features)

EDA Quiz 1

Which part of the CRISP-DM cycle does EDA usually fall?

SOLUTION: Data Understanding

EDA Quiz 2

Which of the following is not a reason that EDA is important?

SOLUTION: EDA is a way to leak data into a dataset.